[SPARK-46294][SQL] Clean up semantics of init vs zero value#44222
Closed
davintjong-db wants to merge 5 commits intoapache:masterfrom
Closed
[SPARK-46294][SQL] Clean up semantics of init vs zero value#44222davintjong-db wants to merge 5 commits intoapache:masterfrom
davintjong-db wants to merge 5 commits intoapache:masterfrom
Conversation
cloud-fan
reviewed
Dec 14, 2023
| // values before calculate max, min, etc. | ||
| private[this] var _value = initValue | ||
| private var _zeroValue = initValue | ||
| class SQLMetric(val metricType: String, |
Contributor
There was a problem hiding this comment.
Suggested change
| class SQLMetric(val metricType: String, | |
| class SQLMetric( | |
| val metricType: String, |
Contributor
There was a problem hiding this comment.
4 spaces indentation for multi-line parameters
cloud-fan
reviewed
Dec 14, 2023
| // This is used to filter out metrics. Metrics with value equal to initValue should | ||
| // be filtered out, since they are either invalid or safe to filter without changing | ||
| // the aggregation defined in [[SQLMetrics.stringValue]]. | ||
| override def isZero: Boolean = _value == initValue |
Contributor
There was a problem hiding this comment.
this is a bit tricky as isZero is not true when we actually have the zero value...
Contributor
There was a problem hiding this comment.
Let's enrich the comment to highlight that, we may want to collect the 0 value for calculating min/max/avg. We can still link to SPARK-11013.
cloud-fan
approved these changes
Dec 14, 2023
Contributor
|
thanks, merging to master! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Cleaning up the semantics of init and zero value to the following. This also helps define what an "invalid" metric is.
initValue is the starting value for a SQLMetric. If a metric has value equal to its initValue, then it can/should be filtered out before aggregating with SQLMetrics.stringValue().
zeroValue defines the lowest value considered valid. If a SQLMetric is invalid, it is set to zeroValue upon receiving any updates, and it also reports zeroValue as its value to avoid exposing it to the user programatically (concern previouosly addressed in SPARK-41442).
For many SQLMetrics, we use initValue = -1 and zeroValue = 0 to indicate that the metric is by default invalid. Whenever an invalid metric is updated, it sets itself to zeroValue and becomes valid. Invalid metrics will be filtered out when calculating min, max, etc. as a workaround for SPARK-11013.
Why are the changes needed?
The semantics of initValue and _zeroValue in SQLMetrics is a little bit confusing, since they effectively mean the same thing. Changing it to the following would be clearer, especially in terms of defining what an "invalid" metric is.
Does this PR introduce any user-facing change?
No. This shouldn't change any behavior.
How was this patch tested?
Existing tests.
Was this patch authored or co-authored using generative AI tooling?
No.